Constrained Policy Optimization

نویسندگان

Joshua Achiam

David Held

Aviv Tamar

Pieter Abbeel

چکیده

For many applications of reinforcement learning it can be more convenient to specify both a reward function and constraints, rather than trying to design behavior through the reward function. For example, systems that physically interact with or around humans should satisfy safety constraints. Recent advances in policy search algorithms (Mnih et al., 2016; Schulman et al., 2015; Lillicrap et al., 2016; Levine et al., 2016) have enabled new capabilities in highdimensional control, but do not consider the constrained setting. We propose Constrained Policy Optimization (CPO), the first general-purpose policy search algorithm for constrained reinforcement learning with guarantees for near-constraint satisfaction at each iteration. Our method allows us to train neural network policies for high-dimensional control while making guarantees about policy behavior all throughout training. Our guarantees are based on a new theoretical result, which is of independent interest: we prove a bound relating the expected returns of two policies to an average divergence between them. We demonstrate the effectiveness of our approach on simulated robot locomotion tasks where the agent must satisfy constraints motivated by safety.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Stock Portfolio-Optimization Model by Mean-Semi-Variance Approach Using of Firefly Algorithm and Imperialist Competitive Algorithm

Selecting approaches with appropriate accuracy and suitable speed for the purpose of making decision is one of the managers’ challenges. Also investing decision is one of the main decisions of managers and it can be referred to securities transaction in financial markets which is one of the investments approaches. When some assets and barriers of real world have been considered, optimization of...

متن کامل

A chance-constrained multi-objective model for final assembly scheduling in ATO systems with uncertain sub-assembly availability

A chance-constraint multi-objective model under uncertainty in the availability of subassemblies is proposed for scheduling in ATO systems. The on-time delivery of customer orders as well as reducing the company's cost is crucial; therefore, a three-objective model is proposed including the minimization of1) overtime, idletime, change-over, and setup costs, 2) total dispersion of items’ deliver...

متن کامل

On the hybrid conjugate gradient method for solving fuzzy optimization problem

In this paper we consider a constrained optimization problem where the objectives are fuzzy functions (fuzzy-valued functions). Fuzzy constrained Optimization (FO) problem plays an important role in many fields, including mathematics, engineering, statistics and so on. In the other side, in the real situations, it is important to know how may obtain its numerical solution of a given interesting...

متن کامل

Resource Allocation and Multiagent Policy Formulation for Resource-Limited Agents Under Uncertainty

The problem of optimal policy formulation for teams of resourcelimited agents in stochastic environments is composed of two strongly coupled subproblems: a resource allocation problem and a policy optimization problem, both of which have individually received significant amount of attention. We show how to combine the two problems into a single constrained optimization problem that yields optim...

متن کامل

Quasi-Newton Methods for Nonconvex Constrained Multiobjective Optimization

Here, a quasi-Newton algorithm for constrained multiobjective optimization is proposed. Under suitable assumptions, global convergence of the algorithm is established.

متن کامل

Constrained Markov Decision Models with Weighted Discounted Rewards

This paper deals with constrained optimization of Markov Decision Processes. Both objective function and constraints are sums of standard discounted rewards, but each with a diierent discount factor. Such models arise, e.g. in production and in applications involving multiple time scales. We prove that if a feasible policy exists, then there exists an optimal policy which is (i) stationary (non...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Constrained Policy Optimization

نویسندگان

چکیده

منابع مشابه

Stock Portfolio-Optimization Model by Mean-Semi-Variance Approach Using of Firefly Algorithm and Imperialist Competitive Algorithm

A chance-constrained multi-objective model for final assembly scheduling in ATO systems with uncertain sub-assembly availability

On the hybrid conjugate gradient method for solving fuzzy optimization problem

Resource Allocation and Multiagent Policy Formulation for Resource-Limited Agents Under Uncertainty

Quasi-Newton Methods for Nonconvex Constrained Multiobjective Optimization

Constrained Markov Decision Models with Weighted Discounted Rewards

عنوان ژورنال:

اشتراک گذاری